Quality Estimation of Deep Web Data Sources for Data Fusion
نویسندگان
چکیده
منابع مشابه
SemaForm: Semantic Wrapper Generation for Querying Deep Web Data Sources
A wealth of data on the World Wide Web is hidden behind web form query interfaces and cannot be found through regular search engines. Querying across multiple such sources is a tedious and error-prone process; it involves manually filling in many related, but different, web forms. SemaForm automates this process by correlating web form labels to entries in a domain ontology through the use of a...
متن کاملSPOT-5 Spectral and Textural Data Fusion for Forest Mean Age and Height Estimation
Precise estimation of the forest structural parameters supports decision makers for sustainable management of the forests. Moreover, timber volume estimation and consequently the economic value of a forest can be derived based on the structural parameter quantization. Mean age and height of the trees are two important parameters for estimating the productivity of the plantations. This research ...
متن کاملSelecting queries from sample to crawl deep web data sources
This paper studies the problem of selecting queries to efficiently crawl a deep web data source using a set of sample documents. Crawling deep web is the process of collecting data from search interfaces by issuing queries. One of the major challenges in crawling deep web is the selection of the queries so that most of the data can be retrieved at a low cost. We propose to learn a set of querie...
متن کاملData Fusion and Data Quality
The recent development of the Internet has made an increasing number of information sources available to users. This makes it necessary to submit queries only to the most appropriate sources. When gathering and combining information from these sources the quality ooered can and must be a criterion for source selection. However, information quality has many dimensions and it is thus diicult to d...
متن کاملDeep Web Data Mining
World Wide Web (WWW) is broadly divided into two categories: one is Surface web that contains 1% of information content of the web and is crawlable by traditional search engines (like Google, Alta vista etc.) and second is deep web( or Hidden Web) that contains 99% of information content of the web. Most of this information is contained in the databases and is not indexed by search engines. As ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Engineering
سال: 2012
ISSN: 1877-7058
DOI: 10.1016/j.proeng.2012.01.313